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Abstract 



An algorithm is presented for the efficient and accurate computation of the 
coefficients of the characteristic polynomial of a general square matrix. The 
algorithm is especially suited for the evaluation of canonical traces in deter- 
minant quantum Monte-Carlo methods. 



Typeset using REVTeX 

*Postdoctoral Fellow of the Fund for Scientific Research - Flanders (Belgium) 

1 



I. INTRODUCTION 



The characteristic polynomial P\j (x) of a general N-by-N matrix U is given by 

P v (x) = det (xl - U) , 



(1) 



(J is the unit matrix). Though the characteristic polynomial of a matrix is a basic concept 
in linear algebra, its numerical computation is scarcely documented in literature. There 
are however a number of applications for which an accurate and efficient algorithm for 
the calculation of the coefficients of Pjj (x) would be useful, e.g. for the study of random 
matrices []]]. It would also be useful for determinant quantum Monte-Carlo (QMC) methods 
the application of determinant QMC method in the canonical ensemble, especially 
the shell-model Monte-Carlo method ||], requires the evaluation of the coefficients of the 
polynomial 



P v (x) = det (J + xU) . 
This polynomial is closely related to P\j (x): 
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(3) 



In the determinant QMC method, the A-particle trace of a one-body evolution matrix U 
is given by the coefficient of x A in Pu(x). This coefficient is equal to ( — 1) A times the 
coefficient of 

x (n-a) in p v ( x y 

It is in the light of these canonical QMC methods that we 
developed an algorithm that is presented in the next sections. Accuracy is important here 
because for calculations at low temperature, QMC methods tend to become very sensitive to 
numerical instabilities in the evaluation of the matrix elements and the coefficients of Pjj (x) 
B. Speed is important here because for one Monte-Carlo run, several thousand matrices 
have to be evaluated. 

The algorithms that can be found in literature, as e.g. the Faddeev-Leverrier method 
H, are far from optimal on the point of view of numerical computation. Let a a denote the 
coefficient of x A in Pjj (x) and let = Tr (U k ^j ■ The coefficients correspond to the power 
sums of the roots of Py (x). Their computation is straightforward. A relation between the 
a a and the % can be obtained by equating the coeffients of the powers of x in the series 
expansion of both sides of 
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This leads to a relation already established by Newton 
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that formally can be written as 









(5) 



The Faddeev-Leverrier method is a recurrence relation that implicitly generates the 
same relations. Though mathematically elegant, this formula is unpractical: it is inaccurate 
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because it is very sensitive to roundoff errors, especially if the eigenvalues of U differ by 
several orders of magnitude, which is common in determinant QMC methods. It is also 
inefficient because it requires A/ 2 matrix multiplications. Therefore the method is only 
useful for small A. Another method, suggested by Ormand et al. ||, amounts to the following 
expression: 

i N-l 

^ = ^7 £ e ~^ mA det ( eifm/ - u ) ■ ^ 

In order to evaluate this expression efficiently, it is suggested to diagonalize U first. However, 
if U is diagonalized, the coefficients of P\j (x) can be evaluated more easily by explicit 
construction of the polynomial 

JV 

det {xl -U)=Y[(x-€i). (7) 

8=1 

If this polynomial in x is constructed from the smallest up to the largest eigenvalue, 
can be computed in an easy and stable way. The main computational effort is in the 
diagonalization of the matrix U. The polynomial can be constructed even more efficiently 
without diagonalization, as is explained in the next section. 

In section II we present an algorithm for calculating the coefficients of the characteristic 
polynomial of a general square matrix. In section III we present the results of the numerical 
tests of the speed and accuracy of the algorithm. 

II. ALGORITHM FOR THE CALCULATION OF THE CHARACTERISTIC 
POLYNOMIAL OF A GENERAL SQUARE MATRIX 

The basic idea of the algorithm is to consider / + xU as a matrix of polynomials in x. 
We then calculate the polynomial P\j (x) by evaluating the determinant in equation || using 
Gaussian elimination, with polynomials instead of scalars as matrix elements. As mentioned 
above, the coefficients of Py (x) are closely related to the coefficients a a- Because the mul- 
tiplication of two polynomials of degree A requires about 2A 2 flops and calculation of a 
determinant about N 3 /3 polynomial multiplications, the calculation would require a num- 
ber of the order of N 5 flops, which is too much for an efficient implementation. This number 
can be drastically reduced if U is transformed to an upper-Hessenberg form by a similarity 
transformation (a Householder reduction to Hessenberg form requires approximatly yiV 3 
flops |J). This leaves the coefficients of Pjj (x) unchanged. In order to calculate the de- 
terminant we transform / + xU to upper diagonal form by Gaussian elimination, requiring 
now only iV 2 polynomial multiplications. The Gaussian elemination is performed from the 
right bottom corner of the matrix up to the top left corner because in determinant QMC 
methods, the right bottom corner often contains the smallest elements, so that the summa- 
tions involved are performed from small to large terms, which is less sensitive to roundoff 
errors than the summation the other way round. We start with T N =I + xU . Now we bring 
column after column in upper triangular form. Suppose that T J has column j to N already 
in upper triangular form, i.e. 
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TL = 0, (8) 

for i > k and k > j. Now we calculate 

T j-i = T j G j , (9) 

where 

Gh = 6 itk , (10) 



except for 



C 3 = T 3 

u ?'y'-l — ' i i 



(11) 



In the end we obtain the upper triangular matrix T 1 = T w G^ 1 • ■ • G 2 so that 



( x ) = det (I + xU) (12) 
= det (T N ) (13) 



det (T 1 ) 



det (G w G^" 1 • ■ • G 2 ) 



(14) 
(15) 



= T 1 1 1 . (16) 

because = T^. The operations can be ordered to minimize memory use. This leads to 
the following algorithm (t ki corresponds with the coefficient of x k in T-j) : 

algorithm for calculating the coefficients of the characteristic 
polynomial of a N — by — Nmatrix U 

reduce U to upper Hessenberg form 
DO j = N, 1,-1 
DO % = l,j 

DO k = N-j,l,-\ 

ENDDO 1 ' 

ENDDO 

DO k = 1, N - j 

ENDDO 
ENDDO 

In the end t kl is the coefficient of x k in P v (x). Then a a is given by 

a A = (-l) N ~ A t N „ A1 . (18) 
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This algorithm cannot break down and requires A 3 /2 + iV 2 — N/2 flops. If one needs only 
the coefficient of x A in Py (x), e.g. for the calculation of an A-particle trace in determinant 
QMC methods, the number of flops can be reduced further by calculating the polynomials 
only up to degree A. This is done by restricting the loop over k to values smaller than or 



equal to A. The sixth line in [T7] then becomes 



DO k = MAX(N-j, A), 1,-1. (19) 

Together with the Householder reduction to the upper Hessenberg form this makes less than 
4A 3 flops. Diagonalization of the matrix U with the QR algorithm (suggested in |J as the 
obvious method for the diagonalization of general square matrices), would require about 
10iV 3 flops. 



III. NUMERICAL TESTS 

We have tested our algorithm numerically on its speed and accuracy. All the tests were 
done in Fortran77 (DEC Fortran V3.8) on a Digital Alphastation 255/300MHz workstation 
running Digital Unix 3. 2D. For the reduction to hessenberg form and the diagonalization 



optimized Lapack routines were used ||10|| . For the part of the algorithm listed in the previous 
section only the standard optimizations of the Fortran compiler were used. 

The speed was tested by calculating, for several matrix sizes, all the coefficients of the 
characteristic polynomial of 100 matrices with random elements (uniformely distributed be- 
tween — ^= and -/^)- This was done with our algorithm and with complete diagonalization. 
The speed was measured by counting the number of cycles executed by the procedures of the 
algorithms (fewer cycles means faster calculation) using the 'prof -pixie' command. Table I 
lists the results. It is clear that our algorithm is much faster than complete diagonalization: 
from a factor 4.5 for small matrices to a factor 1.8 for large matrices. The decrease of this 
factor for large matrices can be understood by the fact that the routines for the reduction 
to hessenberg form and diagonalization are strongly optimized while the routine for the al- 
goritm of section III is not, and that these optimizations become more and more efficient 
with larger matrix sizes. Applying similar optimizations to our algorithm would significantly 
increase the ratio. Typical matrix sizes in determinant QMC methods range from 20-by-20 
to 100-by-100. So there our unoptimized algorithm is a factor 2 to 4 faster than complete 
diagonalization. 

For testing the accuracy, we calculated 200000 random samples with a Determinant 
Quantum Monte-Carlo method for the 4x4 Hubbard model with 8 up and 8 down electrons, 



with U=4 and /3=6, following the method of reference ip.ll , but taking the canonical trace 
instead of the grand-canonical one. For each sample, the canonical trace is given by the 
square of the coefficient of x s in the characteristic polynomial of a 16-by-16 matrix. The 
canonical trace was calculated in double precision and in single precision using our algorithm 
and complete diagonalization. As a measure for the accuracy we used the average absolute 
value of the difference between the single- and double-precision result divided by the double- 
precision result. For our algorithm we found a value of 0.00186 ± 0.00005 and for the 
complete diagonalization we found 0.00607 ± 0.00010 (error limits at 95% confidence level), 
indicating that our algorithm is more accurate. This could be expected since it requires less 
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operations on the data. Furthermore complete diagonalization was much more sensitive to 
overflow errors than our algorithm. At values of (3 > 6 complete diagonalization (in single 
precision) was not usable anymore. 

IV. CONCLUSION 

We have presented a stable and efficient algorithm for the calculation of the coefficients of 
the characteristic polynomial of a general square matrix. This algorithm is especially useful 
for Determinant quantum Monte-Carlo calculations in the canonical ensemble because it is 
faster (a factor 2 to 4) and more accurate than the algorithms that can be found in the 
literature. 
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TABLES 



matrix dimension 


our algorithm 


complete diagonalization 


ratio 


4 




451400 


1983818 


4.39 


6 




1009400 


4413843 


4.37 


8 




1760300 


8062663 


4.58 


10 




2870100 


12511637 


4.36 


15 




7261300 


29676436 


4.09 


20 




14224100 


55696656 


3.93 


25 




25524300 


93696774 


3.67 


30 




41735900 


144177197 


3.45 


35 




63852100 


209670202 


3.28 


40 




90395400 


290658105 


3.22 


45 




126513800 


388488344 


3.07 


50 




171095900 


512056714 


2.99 


60 




284484900 


794032492 


2.79 


70 




447113900 


1163945220 


2.60 


80 




652709400 


1630550332 


2.50 


90 




926006900 


2207209655 


2.38 


100 




1251268900 


2923248380 


2.34 


150 




4268580600 


8925077120 


2.09 


200 




10018384500 


20050929483 


2.00 


300 




32993383700 


63384810388 


1.92 


400 




77249914100 


145321243773 


1.88 


500 




149825926400 


278218705522 


1.86 


600 




257427888100 


474763616745 


1.84 


700 




407433443000 


745631287828 


1.83 


800 




607094132500 


1104878051129 


1.82 


900 




863225666500 


1564619645628 


1.81 



TABLE I. Comparision of the number of cycles needed for the calculation of the coefficients of 



the characteristic polynomial of 100 matrices with random elements, for several matrix dimensions. 
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